Transposition Invariant Pattern Matching for Multi-Track Strings
نویسندگان
چکیده
We consider the problem of multi-track string matching. The task is to find the occurrences of a pattern across parallel strings. Given an alphabet Σ of natural numbers and a set S over Σ of h strings si = s1 · · · s i n for i = 1, . . . , h, a pattern p = p1 · · · pm has such an occurrence at position j of S if p1 = s i1 j , p2 = s i2 j+1, . . . , pm = s im j+m−1 holds for i1, . . . , im ∈ {1, . . . , h}. An application of the problem is music retrieval where occurrences of a monophonic query pattern are searched in a polyphonic music database. In music retrieval it is even more pertinent to allow invariance for pitch level transpositions, i.e., the task is to find whether there are occurrences of p in S such that the formulation above becomes p1 = s i1 j + c, p2 = s i2 j+1 + c, . . . , pm = s im j+m−1 + c for some constant c. We present several algorithms solving the problem. Our main contribution, the MP algorithm, is a transposition-invariant bit-parallel filtering algorithm for static databases. After an O(nhe) time preprocessing, it finds candidates for transposition invariant occurrences in time O(n⌈m/w⌉+m + d) where w, e, and d denote the size of the machine word in bits and two factors dependent on the size of the alphabet, respectively. A straightforward algorithm is used to check whether the candidates are proper occurrences. The algorithm needs time O(hm) per candidate. ACM CCS
منابع مشابه
Restricted Transposition Invariant Approximate String Matching Under Edit Distance
Let A and B be strings with lengths m and n, respectively, over a finite integer alphabet. Two classic string mathing problems are computing the edit distance between A and B, and searching for approximate occurrences of A inside B. We consider the classic Levenshtein distance, but the discussion is applicable also to indel distance. A relatively new variant [8] of string matching, motivated in...
متن کاملKMP Based Pattern Matching Algorithms for Multi-Track Strings
Multi-track string is an N -tuple strings of length n. For two multi-track strings T = (t1, t2, . . . , tN ) of length n and P = (p1, p2, ..., pM ) of length m, permuted pattern matching is a problem to find all positions i such that P is permuted match with T[i : i+M ]. We propose three new algorithms for permuted pattern matching based on the KMP algorithm. The first algorithm is an exact mat...
متن کاملPosition Heaps for Permuted Pattern Matching on Multi-Track String
A multi-set of N strings of length n is called a multi-track string. The permuted pattern matching is the problem that given two multi-track strings T = {t1, . . . , tN} of length n and P = {p1, . . . , pN} of length m, outputs all positions i such that {p1, . . . , pN} = {t1[i : i+m−1], . . . , tN [i : i+m−1]}We propose two new indexing structures for multi-track stings. One is a time-efficien...
متن کاملSearching Monophonic Patterns within Polyphonic Sources
The string matching problem for strings in which one should find the occurrences of a pattern string within a text, is well-studied in the past literature. The problem can be solved efficiently, e.g., by using so-called bit-parallel algorithms. We adapt the bit-parallel approach to music information retrieval. We consider a situation where the pattern is monophonic and the text (the musical sou...
متن کاملTransposition invariant string matching
Given strings A = a1a2 . . . am and B = b1b2 . . . bn over an alphabet Σ ⊆ U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A,B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is mint∈U{d(A+ t,B)}, where A+ t = (a1 + t)(a2 + t) . . . (am + t). We study the problem of computing the transpo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nord. J. Comput.
دوره 10 شماره
صفحات -
تاریخ انتشار 2003